Multimers of s. solfataricus single-stranded dna-binding protein and methods of use thereof

ABSTRACT

The invention provides multimers of  S. solfataricus  ssDNA binding protein that bind single stranded DNA. The multimers are robust and stable reagents for use in PCR and other techniques for engineering DNA. The invention further provides methods for performing nucleic acid amplification and engineering using the multimers.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.10/386,575, filed Mar. 11, 2003, the entire contents of which isincorporated herein by reference.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSOREDRESEARCH AND DEVELOPMENT

This invention was made with government support under Grant (orContract) No. GM62653 awarded by the National Institutes of Health andGrant (or Contract) No. 0074380 awarded by the National ScienceFoundation. The government has certain rights in this invention.

REFERENCE TO A “SEQUENCE LISTING,” A TABLE, OR A COMPUTER PROGRAMLISTING APPENDIX SUBMITTED ON A COMPACT DISK

NOT APPLICABLE

FIELD OF THE INVENTION

This invention relates to single stranded DNA binding proteins that arerobust reagents for use in nucleic acid amplification reactions.

BACKGROUND OF THE INVENTION

Single-stranded DNA (ssDNA) binding proteins (SSBs) are essential inmost intracellular interactions that involve DNA, including replication,repair, and recombination (Kowalczykowski, S. C. et al., Microbiol Rev58:401-465 (1994); Lohman, T. M. et al., Annu Rev Biochem 63:527-570(1994) (“Lohman 1994”)). Homologues of this class of proteins wereidentified in all three domains of life, as well as in viral genomes(Chedin, F. et al., Trends Biochem Sci 23:273-277 (1998) (“Chedin1998”); Iftode, C. et al., Crit Rev Biochem Mol Biol 34:141-180 (1999);Kelly, T. J. et al., Proc Natl Acad Sci USA 95:14634-14639 (1998);Kowalczykowski, S. C. et al., Single-stranded DNA binding proteins. inThe Enzymes. Boyer, P. D. (ed.) New York: Academic Press, pp. 373-442(1981); Lohman 1994; Wold, M. S. Annu Rev Biochem 66:61-92 (1997)).Despite the lack of strong homology at the amino acid level,preservation of both structural and domain organization suggests thatSSBs are derived from a common evolutionary ancestor (Chedin 1998;Pfuetzner, R. A. et al., J Biol Chem 272:430-434 (1997); Raghunathan, S.et al., Nat Struct Biol 7:648-652 (2000)). While functionallyequivalent, eubacterial SSB and the eukaryotic version, RPA, havedistinctly different quaternary structures. In eubacteria, SSB isencoded by a single gene and the active form of the protein is ahomotetramer in which each monomer provides one ssDNA-binding domain(Lohman 1994). In eukaryotes, the RPA complex from both humans and yeastis composed of three distinct subunits which together provide a total offour ssDNA-binding domains (Brill, S. J. et al., Mol Cell Biol18:7225-7234 (1998)).

Archaea are a separate group of organisms distinguished from theeubacteria through 16S rDNA sequence analysis. These prokaryotes arefurther subdivided into three diverse groups named the crenarchaeota,the euryarchaeota, and the korarchaeota (Barns, S. M. et al., Proc NatlAcad Sci USA, 93:9188-9193 (1996); Woese, C. R. et al., Proc Natl AcadSci USA 87:4576-4579 (1990); Woese, C. R. et al., Proc Natl Acad Sci USA74:5088-5090 (1977)). Only members of the crenarchaeal and euryarchaealgroups, however, have been cultivated. Genomic studies suggest asignificant evolutionary division between metabolic and informationalprocesses in archaea. While most intermediary metabolic processesstrongly resemble those observed in eubacteria, genomic informationalprocesses are generally thought to be more closely related to thosefound in eukaryotes (reviewed in (Doolittle, W. F. et al., Curr Biol8:R209-211 (1998))).

Archaea utilize eukaryotic B-type DNA polymerases for replication, andtheir ribosomal proteins, as well as translation initiation factors, areremarkably eukaryotic. The recent identification of archaeal snoRNAgenes reveals an unexpected eukaryotic connection (Omer, A. D. et al.,Science 288:517-522 (2000)). Transcription also involves eukaryoticprotein homologues, but the discovery of multiple TBP and TFB proteinsin halophiles hints at a unique archaeal transcription mechanism(Baliga, N. S. et al., Mol Microbiol, 36:1184-1185 (2000)). Examinationof archaeal recombination proteins suggests a definite similarity witheukaryotes. The archaeal DNA strand exchange protein RadA is moresimilar to its eukaryotic counterpart, Rad51 protein, than to theeubacterial RecA protein, both at the amino acid level (Sandler, S. J.et al., J Bacteriol 181:907-915 (1999)) and at the biochemical level(Seitz, E. M. et al., Genes Dev 12:1248-1253 (1998)). The associatedRadB protein is proposed to serve as a simpler archaeal version ofeukaryotic Rad55/57 protein (DiRuggiero, J. et al., J Mol Evol49:474-484 (1999); Komori, K. et al., J Biol Chem 275:33782-33790(2000); Rashid, N. et al., Mol Gen Genet 253:397-400 (1996)), andarchaeal Holliday junction resolvase protein characterization suggeststhey may also be eukaryotic in nature (Komori, K. et al., Proc Natl AcadSci USA 96:8873-8878 (1999); Kvaratskhelia, M. et al., J Mol Biol297:923-932 (2000)).

Recent descriptions of archaeal SSB homologues from the euryarchaealbranch of the archaeal domain demonstrate their amino acid sequences aremore similar to eukaryotic RPA than to eubacterial SSB (Chedin 1998;Kelly, T. J. et al., Proc Natl Acad Sci USA 95:14634-14639 (1998)).However, these archaeal proteins maintain multiple ssDNA-binding domainswithin one or just a pair of polypeptides and therefore, are expected tofunction as monomers or heterodimers rather than as a homotetramer (asdoes E. coli SSB). It was proposed that these archaeal RPA homologuesare evolutionarily related to eukaryotic RPA through gene duplicationand recombination events (Chedin 1998).

Genome sequencing of several archaeons simplified molecular analysis inthese organisms. While a number of euryarchaeal genome sequences havebeen determined, to date the only publicly available crenarchaeal genomesequences are that of Aeropyrum pernix (Kawarabayasi, Y. et al., DNA Res6:83-101:145-152 (1999)) and Sulfolobus solfataricus (She, Q. et al.,Proc Natl Acad Sci USA 98:7835-7840 (2001)). In 2001, Wadsworth andWhite described the identification of an ssDNA binding protein from S.solfataricus. Wadsworth and White, Nuc Acids Res 29(4):914-920 (2001).

BRIEF SUMMARY OF THE INVENTION

This invention provides isolated multimers, wherein each unit of saidmultimer has at least 70% sequence identity to SEQ ID NO:1, and whereinthe multimer binds single stranded DNA. In some embodiments, each unitof the multimer has at least 80% sequence identity to SEQ ID NO:1. Inother embodiments, each unit of the multimer has at least 90% sequenceidentity to SEQ ID NO:1. The invention further provides embodimentswherein each unit has the sequence of SEQ ID NO:1. In some preferredembodiments, the multimer is a tetramer.

In another important group of embodiments, the invention providesmethods of performing nucleic acid amplification, said method comprisingcontacting a single stranded DNA with a multimeric protein, wherein eachunit of the multimeric protein has at least 70% sequence identity to SEQID NO:1, and wherein said multimer binds single stranded DNA. In some ofthese embodiments, each unit of the multimeric protein has at least 80%sequence identity to SEQ ID NO:1, while in others, each unit of themultimeric protein has at least 90% sequence identity to SEQ ID NO:1. Insome embodiments, each unit of the multimeric protein has the sequenceof SEQ ID NO:1. The nucleic acid amplification can be, for example,polymerase chain reaction, ligase chain reaction, transcription-basedamplification system, and self-sustained sequence replication system. Insome embodiments, the method of nucleic acid amplification is polymerasechain reaction.

The invention further provides methods for performing nucleic acidengineering, comprising contacting single stranded DNA with a multimericprotein, wherein each unit of the multimeric protein has at least 70%sequence identity to SEQ ID NO:1, and wherein said multimer binds singlestranded DNA. In some of these embodiments, each unit of the multimericprotein has at least 80% sequence identity to SEQ ID NO:1, while inothers, each unit of the multimeric protein has at least 90% sequenceidentity to SEQ ID NO:1. In some embodiments, each unit of themultimeric protein has the sequence of SEQ ID NO:1. The nucleic acidengineering can be, for example, PCR-based DNA sequencing, recombinationmediated cloning, PCR-mediated gene replacement, PCR-mediatedrecombination, RT-PCR cDNA synthesis, and in vitro sequence mutagenesis.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Alignment of ssDNA-binding domain protein sequences.Crenarchaeal sequences are aligned with the first ssDNA-binding domainsof M. jannachii RPA (SEQ ID NO:5), S. cerevisiae RPA70 (SEQ ID NO:6),and H. sapiens RPA70 (SEQ ID NO:7). Identical residues are shaded blackand conserved residues are shaded gray. The * symbol indicates residuesidentified in the H. sapiens RPA70 that interact with DNA (Bochkarev, A.et al., Nature, 385:176-181 (1997)), while an x indicates residues thatare identical between the two crenarchaeal sequences. The consensus isshown beneath the alignment. Sequence accession numbers are as follows:S. solfataricus SSB (portion shown in Figure is SEQ ID NO:3), SSO2364;A. pernix SSB (portion shown in Figure is SEQ ID NO:4), gi5105001; M.jannachii RPA, MJ1159; S. cerevisiae RPA70, gi6319321; H. sapiens RPA70,gi1350579.

FIG. 2. Schematic representation of the domainal architecture ofssDNA-binding proteins in all three domains of life. HomologousDNA-binding domains are represented by shaded boxes and the location ofthe zinc-finger motif is indicated. A summary of the attributes of eachtype of ssDNA-binding protein is represented in the boxes. A range ofpercentage similarities for proteins from each domain to the firstsingle-stranded DNA-binding domain of S. cerevisiae RPA70 weredetermined using the BestFit program and are indicated.

FIG. 3. SDS-PAGE of the purified SsoSSB protein. Samples were subjectedto SDS-PAGE and gels were stained with Coomassie brilliant blue 8250.The samples loaded were uninduced crude cell sonicate (lane 1, 100 μgprotein); induced crude cell sonicate (lane 2, 100 μg protein);heat-treated clarified sonicate (lane 3, 80 μg protein); pooledssDNA-cellulose fractions (lane 4, 15 μg protein); and concentratedResource Q fractions (lane 5, 15 μg protein). The arrow indicates theposition of the SsoSSB protein.

FIG. 4 consists of FIGS. 4A and 4B, showing gel filtrationchromatography of SsoSSB protein. FIG. 4A. Elution of purified SsoSSBprotein relative to molecular weight standards: BSA (66 kDa), carbonicanhydrase (29 kDa), and cytochrome C (12.4 kDa), which are representedby closed squares. Ve/Vo, elution volume divided by void volume. FIG.4B. A representative elution profile for purified SsoSSB protein. Closedsquares represent optical density at 280 nm (OD₂₈₀) for the elutionvolume indicated.

FIG. 5. Gel mobility-shift analysis of SsoSSB protein binding to ssDNA.Increasing concentrations of protein (0.04 μM to 20 μM) were added to aconstant concentration of 63-mer oligonucleotide (10 μM nucleotides).

FIG. 6 consists of FIGS. 6A and 6B, and shows that overexpression ofSsoSSB protein rescues the lethal phenotype of an E. coli ssb-1mutation. Both Figures: KLC789 cells containing the pTaraarabinose-inducible T7 expression plasmid, and either pET21a (circles)or the SsoSSB expression vector (squares) were grown at 30° C. in eitherarabinose (closed symbols) or glucose (open symbols). Cultures wereshifted to 43° C. at the point indicated by the arrow. FIG. 6A. Opticaldensities were monitored spectrophotometrically at 600 nm. FIG. 6B.Colony forming units (cfu) were determined by plating in triplicate andthe points shown are averages of the replicates.

FIG. 7 consists of FIGS. 7A and 7B and shows that SsoSSB proteinstimulates DNA strand exchange by E. coli RecA protein. FIG. 7A. Aschematic representation of the formation of nicked circular dsDNA andjoint molecules from starting substrates. FIG. 7B. Photo of gel. Lane 1,no protein; lane 2, RecA protein; lane 3, RecA protein and SsoSSBprotein; lane 4, RecA protein and E. coli SSB protein. The abbreviationsare: JM, joint molecules; NC, nicked circular dsDNA; DS, dsDNA; and SS,ssDNA.

FIG. 8 consists of FIGS. 8A and 8B. FIG. 8A sets forth the amino acidsequence of SsoSSB protein. FIG. 8B sets forth the nucleotide sequenceencoding SsoSSB protein.

DETAILED DESCRIPTION I. Introduction

The crenarchaeon S. solfataricus is a hyperthermophic aerobe that growsin sulfur hot springs. Its optimal growth conditions are temperatures of70-90° C. and pH levels from 2-4. The entire genome of the organism wassequenced and published in 2001. She et al., “The complete genome of thecrenarchaeon Sulfolobus solfataricus P2,” Proc. Natl. Acad Sci (USA)98:7835-7840 (2001). In 2001, Wadsworth and White reported theidentification of a single-stranded DNA (ssDNA) binding protein (SSB)from S. solfataricus. Wadsworth and White, Nuc Acids Res 29(4):914-920(2001). The work from this laboratory indicates that the SSB is presentas a monomer.

The S. solfataricus ssDNA binding protein (“SsoSSB protein”) consists of148 amino acids with 47% identity and 69% similarity (that is, that theresidues are either identical or conservative substitutions for oneanother) to the SSB of another crenarchaeon, A. pernix. The amino acidsequence of SsoSSB (SEQ ID NO:1) is shown in FIG. 8A, and is availablein the National Center for Biotechnology Information Entrez databaseunder accession numbers NP_(—)343725 and AAK42515.

Surprisingly, it has been discovered that the monomers of S.solfataricus ssDNA binding protein (“SsoSSB protein”) associate with oneanother to form a complex, referred to herein as “multimeric protein” ora “multimer”) in solution and it is the complex, or multimeric protein,that is functional in binding ssDNA. The multimeric proteins are,however, composed of monomers of the SsoSSB protein. The multimers arecomprised of multiples of 2. It is believed that the multimers are notcomposed of more than 24 monomers of SsoSSB, and are more commonlycomposed of 12 or fewer. Data from the studies reported in the Examplesindicates that SsoSSB is present in solution primarily as dimers andtetramers, with tetramers being the prevalent multimeric form present.The most active form of the protein is therefore a homotetramer in whicheach monomer provides one ssDNA-binding domain. While S. solfataricusSSB has no sequence homology to the E. coli SSB protein, it thereforephysically acts more like the SSB found in eubacteria, whichmultimerize, than like those found in eukaryotes and the SSB of otherarchaeons identified to date.

Accordingly, the present invention provides multimers comprisingmonomers of single stranded DNA binding protein of S. solfataricus or ofdefined variations of the SsoSSB protein that retain the ability to bindssDNA. In preferred forms, the multimer is a dimer (that is, an assemblyof two monomers) or a tetramer (that is, an assembly of four monomers),with the tetramer form being the most preferred. The multimers functionto bind ssDNA.

II. Uses of the ssDNA-Binding Proteins of the Invention

Members of the Archaea typically live in conditions of extreme heat, pH,or salt concentrations. Thus, they offer a source of enzymes and otherproteins which can be useful reagents in assays and other commercialreactions. As noted above, the crenarchaeon S. solfataricus growsoptimally at temperatures between 70-90° C. Thus, its proteins areparticularly adapted for use at temperatures which are high for reagentsoriginating from biological sources. More specifically, the DNAreplication of S. solfataricus takes place at high temperature. But, themultimers are active over a wide range of temperatures, and can be usedat temperatures as low as 37° C. to as high as 65° C. and even as highas 90° C.

During polymerase chain reaction (PCR), this activity permits DNApolymerase to replicate more of the DNA template strand in each PCRcycle than would be replicated in the absence of the protein, therebyincreasing yield. Moreover, temperature-resistant proteins such asSsoSSB protein multimers are not inactivated by the temperature cyclingwhich is part of the PCR process, and thus do not have to be replacedbefore the next reaction can proceed. This enhances the ability toautomate the procedures. Thus, use of heat-resistant ssDNA-bindingproteins, like the multimers provided here, not only increases the yieldof each PCR cycle, but also permits automation of the overall processand the speed with which cycles can be conducted.

Additionally, archaeal proteins are much more stable than mosteukaryotic and bacterial proteins. For example, SSB protein from E. colimust be stored at −80° C. If refrigerated, it loses some or all of itsactivity within a month and at room temperature, it loses some or all ofits activity within 72 hours. By contrast, SsoSSB retains its activityat room temperature for at least three weeks, and can be refrigeratedfor over a year without loss of activity. The stability of the proteintherefore makes it convenient for use even in protocols in which hightemperatures are not required.

SsoSSB protein multimers are therefore robust reagents useful for avariety of biotechnical applications involving amplification orengineering of nucleic acids, or both. The multimers are expected to beuseful in a number of such techniques well known in the art, such asPCR, ligase chain reaction, transcription-based amplification system,self-sustained sequence replication system, PCR-based DNA sequencing,recombination mediated cloning, PCR-mediated gene replacement,PCR-mediated recombination, reverse transcriptase (RT)-PCR cDNAsynthesis, and in vitro sequence mutagenesis. For convenience,techniques which employ the binding of ssDNA in the course ofmanipulating nucleic acids, such as PCR-based DNA sequencing,recombination mediated cloning, PCR-mediated gene replacement,PCR-mediated recombination, reverse transcriptase (RT)-PCR cDNAsynthesis, and in vitro sequence mutagenesis, are sometimes referred toherein as “nucleic acid engineering.”

III. Modifications of S. solfataricus ssDNA Binding Protein Multimers.

It is understood that the S. solfataricus ssDNA binding protein can bemodified and still retain the desired robustness and ssDNA bindingfunction. In some embodiments, the amino acid sequence has at least 70%identity to that of SEQ ID NO:1. In other embodiments, the amino acidsequence has 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%or even 99% identity to SEQ ID NO:1 (with each increasing percentage ofidentity being more preferred), and retains the ability to bind ssDNA.

It should be noted that the monomers composing a particular multimerneed not be exact duplicates of one another. Thus, while a dimer or atetramer of the invention is for convenience considered a homodimer or ahomotetramer, the monomers of the dimer or tetramer may not be preciselyidentical. In a given tetramer, for example, one monomer might have thenative sequence of SEQ ID NO:1, the second monomer might have aconservative substitution compared to the native sequence (SEQ ID NO:1)and the third and fourth monomers might have only 80% and 90% sequenceidentity, respectively, to the native sequence. Or, all four monomers ofthe tetramer might be close to the sequence of SEQ ID NO:1, but eachmight have several conservative substitutions of residues of SEQ IDNO:1. Or, all four monomers might have the same two conservativesubstitutions and otherwise have the sequence of SEQ ID NO:1, or allfour might have the sequence of native SEQ ID NO:1, in which case allthe monomers are identical to one another. Whatever the exactcomposition of the individual monomers, however, it is important is thatthe multimer they form retains the desired ability to bind ssDNA.

Persons of skill are also aware that ssDNA binding proteins typicallycontain a motif known as an oligonucleotide/oligosaccharide binding(“OB”)-fold”. OB-fold proteins are a superfamily of proteins havingcommon structural features; both the superfamily and the commonstructural features are well known in the art. See, e.g., Callebaut andMornon, Biochem J 321:125-32 (1997). Williamson et al., Biochemistry33:11745-59 (1994) summarize some of these features: “[t]he commonstructural features include the number of beta-strands and theirarrangement, the beta-barrel shear number, an interstrand hydrogen bondnetwork, the packing of the hydrophobic core, and a conservedbeta-bulge.” One feature of ssDNA binding, OB-fold proteins is thepresence of a channel in the protein so sized as to permit binding ofssDNA along the channel. As noted by Bochkarev et al., Nature 385:176-81(1997): “[t]he ssDNA lies in a channel that extends from one subdomainto the other.” For ease in visualization, the protein channel issometimes described in the art as similar in conformation to a handcurled around a glass.

SsoSSB protein is a monomer, which contains an OB-fold between residues32-71. The monomers assemble into a multimer with structural featurestypical of OB-fold proteins which bind ssDNA. Multimers of SsoSSBproteins form a channel so sized as to permit binding of the ssDNA alongthe channel. The carboxyl terminus of the protein monomers also containsa number of acidic residues.

FIG. 1 shows an alignment of SsoSSB protein with other OB-fold ssDNAbinding proteins. As noted in the Description of FIG. 1, the SsoSSBresidues shown on a black background are identical among these proteins.These residues can be assumed to be important for protein function andtheir substitution in an SsoSSB monomer is therefore generally lessfavored. As also noted in the Description of FIG. 1, residues shown on agray background are conservative substitutions among the proteins. Thus,it is expected that other conservative substitutions of these residuesin an SsoSSB monomer will likely result in a functional ssDNA bindingproteins multimer. The residues shown in FIG. 1 on a normal, whitebackground are not conserved among the various ssDNA binding proteinsaligned in the Figure; these residues can generally undergosubstitution. In preferred embodiments, the substitutions areconservative substitutions. Substitutions can also generally be madeoutside of the OB-fold region defined by residues 32-71. Any particularsubstitution can be readily tested, for example by the assays set forthin the Examples, to confirm that the substitution does not decrease theability to bind ssDNA below any particular degree chosen by thepractitioner.

Preferably, a multimer containing the modified protein retains at least50% of the ability of a multimer of native S. solfataricus SSB to bindDNA. More preferably, a multimer containing the modified protein has atleast 60%, 65%, 70% 75%, 80%, 85%, 90%, 95% or even more of the abilityof a multimer of native S. solfataricus SSB to bind ssDNA, as measuredin such assays. Gel-shift assays provide especially convenient methodsof determining the degree to which any particular modified SsoSSBprotein multimer retains the ability of a native SsoSSB protein multimerto bind single stranded DNA. Many other such assays are, however, knownin the art and can be used at the practitioner's choice. For example,the multimers can be permitted to bind ssDNA and the fluorescence of theproteins examined by spectrophotometry. Higher degrees of binding aredetected by decreased fluorescence as more amino acids of the proteinsare blocked by the DNA.

Although the discussion above refers to substitutions of the nativeSsoSSB sequence, SEQ ID NO:1, persons of skill will be aware that it isnot necessary to first synthesize or express the protein and then tomodify it. Typically, the practitioner decides on the substitution orsubstitutions desired, and assembles a nucleic acid vector that whenexpressed in a suitable host cell, such as E. coli, results in a proteinwith the desired sequence. Kits for engineering plasmids containingdesired nucleic acid inserts and expressing such vectors in host cellsare commercially available from a number of venders, and are well knownin the art.

The terms “identical” or percent “identity,” in the context of two ormore polypeptide sequences, refer to two or more sequences orsubsequences that are the same or have a specified percentage of aminoacid residues that are the same, when compared and aligned for maximumcorrespondence over a comparison window, as measured using one of thefollowing sequence comparison algorithms or by manual alignment andvisual inspection. When percentage of sequence identity is used inreference to proteins or peptides, it is recognized that residuepositions that are not identical often differ by conservative amino acidsubstitutions, where amino acids residues are substituted for otheramino acid residues with similar chemical properties (e.g., charge orhydrophobicity) and therefore do not change the functional properties ofthe molecule. Where sequences differ in conservative substitutions, thepercent sequence identity may be adjusted upwards to correct for theconservative nature of the substitution. Means for making thisadjustment are well known to those of skill in the art. Typically thisinvolves scoring a conservative substitution as a partial rather than afull mismatch, thereby increasing the percentage sequence identity.Thus, for example, where an identical amino acid is given a score of 1and a non-conservative substitution is given a score of zero, aconservative substitution is given a score between zero and 1. Thescoring of conservative substitutions is calculated according to, e.g.,the algorithm of Meyers & Miller, Computer Applic. Biol. Sci. 4:11-17(1988), e.g., as implemented in the program PC/GENE (Intelligenetics,Mountain View, Calif., USA).

As noted, one type of substitution is termed a “conservativesubstitution.” One of skill will recognize that individualsubstitutions, in a peptide, polypeptide, or protein sequence whichalters a single amino acid or a small percentage of amino acids in theencoded sequence are “conservatively modified variants” where thealteration results in the substitution of an amino acid with achemically similar amino acid. Conservative substitution tablesproviding functionally similar amino acids are well known in the art.The following six groups each contain amino acids that are generallyconsidered to be conservative substitutions for one another:

-   -   1) Alanine (A), Serine (S), Threonine (T);    -   2) Aspartic acid (D), Glutamic acid (E);    -   3) Asparagine (N), Glutamine (Q);    -   4) Arginine (R), Lysine (K);    -   5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and    -   6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).    -   (see, e.g., Creighton, Proteins (1984)).        The phrase “substantially identical,” in the context of two        polypeptides, refers to sequences or subsequences that have at        least 60%, preferably 70%, more preferably 80%, most preferably        90-95% amino acid residue identity when aligned for maximum        correspondence over a comparison window as measured using one of        the following sequence comparison algorithms or by manual        alignment and visual inspection. This definition also refers to        the complement of a test sequence, which has substantial        sequence or subsequence complementarity when the test sequence        has substantial identity to a reference sequence.

One of skill in the art will recognize that two polypeptides can also be“substantially identical” if the two polypeptides are immunologicallysimilar. Thus, overall protein structure may be similar while theprimary structure of the two polypeptides display significant variation.Therefore, a method to measure whether two polypeptides aresubstantially identical involves measuring the binding of monoclonal orpolyclonal antibodies to each polypeptide. Two polypeptides aresubstantially identical if the antibodies specific for a firstpolypeptide bind to a second polypeptide with an affinity of at leastone third of the affinity for the first polypeptide.

For sequence comparison, typically one sequence acts as a referencesequence, to which test sequences are compared. When using a sequencecomparison algorithm, test and reference sequences are input into acomputer, subsequence coordinates are designated, if necessary, andsequence algorithm program parameters are designated. The sequencecomparison algorithm then calculates the percent sequence identity forthe test sequence(s) relative to the reference sequence, based on thedesignated program parameters.

Optimal alignment of sequences for comparison can be conducted, e.g., bythe local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482(1981), by the homology alignment algorithm of Needleman & Wunsch, J.Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson& Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerizedimplementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA inthe Wisconsin Genetics Software Package, Genetics Computer Group, 575Science Dr., Madison, Wis.), or by visual inspection (see generally,Current Protocols in Molecular Biology, F. M. Ausubel et al., eds.,Current Protocols, a joint venture between Greene Publishing Associates,Inc. and John Wiley & Sons, Inc., (1995 Supplement) (Ausubel)).

Examples of algorithms that are suitable for determining percentsequence identity and sequence similarity are the BLAST and BLAST 2.0algorithms, which are described in Altschul et al. (1990) J. Mol. Biol.215: 403-410 and Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses ispublicly available through the National Center for BiotechnologyInformation (http://www.ncbi.nlm.nih.gov/). This algorithm involvesfirst identifying high scoring sequence pairs (HSPs) by identifyingshort words of length W in the query sequence, which either match orsatisfy some positive-valued threshold score T when aligned with a wordof the same length in a database sequence. T is referred to as theneighborhood word score threshold (Altschul et al, supra). These initialneighborhood word hits act as seeds for initiating searches to findlonger HSPs containing them. The word hits are then extended in bothdirections along each sequence for as far as the cumulative alignmentscore can be increased. Cumulative scores are calculated using, fornucleotide sequences, the parameters M (reward score for a pair ofmatching residues; always >0) and N (penalty score for mismatchingresidues; always <0). For amino acid sequences, a scoring matrix is usedto calculate the cumulative score. Extension of the word hits in eachdirection are halted when: the cumulative alignment score falls off bythe quantity X from its maximum achieved value; the cumulative scoregoes to zero or below, due to the accumulation of one or morenegative-scoring residue alignments; or the end of either sequence isreached. The BLAST algorithm parameters W, T, and X determine thesensitivity and speed of the alignment. The BLASTN program (fornucleotide sequences) uses as defaults a wordlength (W) of 11, anexpectation (E) of 10, M=5, N=−4, and a comparison of both strands. Foramino acid sequences, the BLASTP program uses as defaults a wordlength(W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (seeHenikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)).

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA90:5873-5787 (1993)). One measure of similarity provided by the BLASTalgorithm is the smallest sum probability (P(N)), which provides anindication of the probability by which a match between two amino acidsequences would occur by chance.

“Conservatively modified variants” applies to both amino acid andnucleic acid sequences. With respect to particular nucleic acidsequences, conservatively modified variants refers to those nucleicacids which encode identical or essentially identical amino acidsequences, or where the nucleic acid does not encode an amino acidsequence, to essentially identical sequences. Because of the degeneracyof the genetic code, a large number of functionally identical nucleicacids encode any given protein. For instance, the codons GCA, GCC, GCGand GCU all encode the amino acid alanine. Thus, at every position wherean alanine is specified by a codon, the codon can be altered to any ofthe corresponding codons described without altering the encodedpolypeptide. Such nucleic acid variations are “silent variations,” whichare one species of conservatively modified variations. One of skill willrecognize that each codon in a nucleic acid (except AUG, which isordinarily the only codon for methionine) can be modified to yield afunctionally identical molecule. Accordingly, each silent variation of anucleic acid which encodes a polypeptide is implicit in any particulardescribed sequence.

EXAMPLES

The following examples are offered to illustrate, but not to limit theclaimed invention.

Example 1

This Example sets forth materials and methods used in the studiesreported herein.

Alignment of protein sequences. The S. solfataricus ssb sequence(SSO2364) encoding the SSB protein (SsoSSB) was identified using BLASTPat the S. solfataricus genome website:http://www-archbac.u-psud.fr/projects/sulfolobus. The A. pernix ssbsequence (gi5105001) encoding the SSB protein (ApeSSB) was identifiedusing BLAST at the PEDANT website: http://pedant.gsf.de/. Both openreading frames were recognized through their homology to MJ1159,encoding the Methanococcus jannaschii RPA protein (Chedin, F. et al.,Trends Biochem Sci 23:273-277 (1998)). Subsequent alignments wereperformed using the ALIGN program athttp://www.toulouse.inra.fr/multalin.html and additional features werehighlighted by manual adjustment. BestFit comparisons were performedusing the Wisconsin Package Version 10.1, Genetics Computer Group (GCG),Madison, Wis. and were between S. cerevisiae RPA70 single-strandedDNA-binding region 1 (gi6319321, amino acids 301-399) and the followingsingle-stranded DNA-binding protein sequences; Archaeoglobus fulgidus(gi11497994), A. pernix (gi5105001), Bacillus subtilis (gi2127217),Escherichia coli (gi134913) Homo sapiens (gi1350579), M. jannaschii(MJ1159), Methanobacterium thermoautotrophicum (gi2622495), Pyrococcusabyssii (gi5457718), Pyrococcus horikoshii (gi3258332), S. solfataricus(SSO2364), as well as between E. coli SSB protein (gi134913) and S.solfataricus (SSO2364). The A. fulgidus sequence used was one of twoidentified as homologous to MJ1159 and is the most homologous to theN-terminus of the M. jannaschii protein (Chedin 1998). The M.thermoautotrophicum sequence was adjusted to account for the frameshiftidentified by Chedin 1998.

Strains and cultivation. S. solfataricus strain P2 (DSM 1616, (Zillig,W. et al., Arch Microbiol 125:259-269 (1980)) was the generous gift ofDennis Grogan (University of Cincinnati) and was grown at 80° C. asdescribed (Rolfsmeier, M. et al., J Bacteriol 180:1287-1295 (1998)) at apH of 3.0 in screw cap flasks as described (Rolfsmeier, M. et al., JBacteriol 177:482-485 (1995)). Basal salts medium was Allen's medium(Allen M. B., Arch. Mikrobiol., 32:270-277 (1959)) as modified by Brock(Brock, T. D. et al., Arch Mikrobiol 84:54-68 (1972)) and wassupplemented with tryptone to a final concentration of 0.2% (w/v).Growth was monitored spectrophotometrically at a wavelength of 540 nm.Escherichia coli strains were DH5α (φ80dlacZAΔ15, endA1, recA1, hsdR17(r_(k) ⁻,m_(k) ⁺), supE44, thi-1, gyrA96, relA1, Δ(lacZYA-argF)U169);BL21(DE3) (ompT [lon] hsdS_(B) (r_(B) ⁻m_(B) ⁻; an E. coli B strain)with DE3, a λ prophage carrying the T7 RNA polymerase gene); and KLC789(F, metA7, rha8, thyA36, amp50, deoC2, ssb-1) (Chase, J. W. et al., JMol Biol 164:193-211 (1983)) from laboratory collections or BL21(DE3)CodonPlus ultracompetent cells from Stratagene. E. coli was propagatedin LB medium (Sambrook, J. et al., Molecular cloning: a laboratorymanual. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press(1989)) at 30° C. in Ehrlenmeyer flasks shaken at 250 rpm.

PCR amplification and cloning of the SsoSSB gene. Genomic DNA wasprepared from S. solfataricus cells as previously described (Rolfsmeier,M. et al., J Bacteriol 180:1287-1295 (1998)). PCR was performed using 10mM potassium chloride, 10 mM ammonium sulfate, 2 mM magnesium chloride,20 mM Tris-Cl (pH 8.75), 0.1% Triton X-100, 100 μM dNTP's, 100 pmolprimers, 2 ng template DNA, and 2.5 U of recombinant Pfu DNA polymerase(Stratagene). The primers for amplification of SsoSSB were:5′-CGGGATCCCCTTTCA TTAACACATAGATTTATAAATGG-3′ (SEQ ID NO:8) (SSB-F) and5′-CGGGATCCGGAGCAA GCTCGTATACTTTGTCTCTAGCC-3′ (SEQ ID NO:9) (SSB-R). Allprimer sequences were chosen based on sequence information presented atthe S. solfataricus genome website:http://www-archbac.u-psud.fr/projects/sulfolobus. PCR was performedusing a 55° C. annealing temperature and the resulting PCR products weredigested with BamHI and ligated into the BamHI site of pUC19. Ligatedmolecules were transformed into DH5α as previously described(Rolfsmeier, M. et al., J Bacteriol 180:1287-1295 (1998)). Plasmids fromtransformants were isolated using the Qiagen midiprep system and DNAsequences were determined using BigDye dRhodamine Terminator chemistry(Perkin-Elmer Corp.) at the Division of Biological Sciences AutomatedDNA sequencing facility at UC Davis.

Overexpression of SsoSSB protein. Sequence information obtained from thepUC19 clones was used to design a forward PCR primer with an NdeI siteat the starting ATG codon for SsoSSB. The gene sequence was re-amplifiedfrom the cloned template using the new forward primer(5′-GTGAGTCGAGTCATATGGAAG-3′) (SEQ ID NO:10) and the original reverseprimer. The resulting product was digested using NdeI and BamHI prior toligation into pET21a (Novagen) that had been digested with the sameenzymes to place the gene under the control of the T7 promoter. Ligationproducts were transformed into the CodonPlus strain (Stratagene) andtransformants were cultivated at 30° C. in LB containing 100 μg/mlampicillin until mid-log phase.

Purification of SsoSSB protein. BL21(DE3) CodonPlus cells (Stratagene)harboring the pET21a SsoSSB expression construct were grown at 30° C. ina 500 ml volume to an OD₆₀₀ of 1.0. IPTG was added to a finalconcentration of 1 mM and expression was allowed to continue for 2hours. Cells were harvested by centrifugation and stored at −20° C.until processing. The frozen cell pellet was resuspended in 4 ml of 10mM Tris-Cl (pH 7.5), 1 mM EDTA (TE) with 50 mM NaCl and sonicated todisrupt the cells. The sonicate was heat treated at 80° C. for 1 hour,and insoluble material was removed by centrifugation. Clarified sonicatewas applied to a ssDNA cellulose column equilibrated in 30 mM Tris-Cl(pH 7.5), 1 mM EDTA, 1 mM DTT, and 10% glycerol; the column was washedwith the same buffer containing 0.5 M NaCl and 0.75 M NaCl at roomtemperature. Fractions eluting at 0.75 M NaCl were pooled and dialyzedinto buffer containing 20 mM Tris-Cl (pH 7.5), 1 mM DTT, 1 mM EDTA, and10% glycerol. This material was then applied to a Resource Q column(Pharmacia) equilibrated with the same buffer at room temperature.Protein was eluted using a gradient of 50 mM NaCl to 1 M NaCl in thesame buffer; the SsoSSB protein eluted from the column at approximately60 mM NaCl. The protein was pooled and concentrated by using drypolyethylene glycol and then dialyzed against 25 mM Tris HCl (pH 7.5),20 mM NaCl, 1 mM EDTA, 1 mM DTT, 10% spectral grade glycerol, and storedat 4° C. Protein concentrations were obtained by spectrophotometricabsorbance at a wavelength of 280 nm, using an extinction coefficient of12660 M⁻¹ cm⁻¹ as determined with the ProtParam tool at the ExPASywebsite (http://expasy.cbr.nrc.ca/tools/protparam.html).

Gel filtration of SsoSSB protein. Fast protein liquid chromatography(FPLC) was performed at 4° C. using a Superose 12 column (Pharmacia) and25 mM Tris HCl pH 7.5, 1 mM DTT, 100 mM NaCl, 1 mM EDTA as the runningbuffer. Molecular size standards were BSA (66 kDa), carbonic anhydrase(29 kDa), and cytochrome C (12.4 kDa) and were prepared in runningbuffer. A total of 10 μg of SsoSSB protein was loaded on the column in avolume of 100 μl. Elution profiles were determined by monitoring OD₂₈₀readings and a standard curve was prepared by plotting Ve/Vo against themolecular mass of the size standards. The value of Vo was determined byelution of dextran blue from the Superose 12 column.

Gel mobility-shift analysis. The 63-mer oligonucleotide 5′-ACAGCACCAATGAAATCTATTAAGCTCCTCATCGTCCGCAAAAATATCGTCACCTCAAAAGGA-3′ (SEQ ID NO:11)was end-labeled with ³²P using T4 polynucleotide kinase (NEB). SsoSSBprotein was incubated at the indicated concentrations with 10 μM(nucleotides) of the ³²P-labeled oligonucleotide for 30 minutes at 75°C. in buffer containing 30 mM TrisOAc (pH 7.5), 10 mM MgOAc₂, 5 mM NaCl,0.1 mM DTT and 50 μg/ml BSA. Increasing concentrations of linearizedpUC19 were used as the dsDNA competitor as indicated in the text.Loading dye was then added and the samples were applied to a vertical10% acrylamide gel prepared with 1×TBE buffer (0.089 M Tris-borate,0.089 M boric acid, 0.002 M EDTA).

In vivo complementation. The SsoSSB expression vector or pET21a (emptyvector) was transformed into E. coli strain KLC789 (Chase, J. W. et al.,J Mol Biol 164:193-211 (1983)) containing pTara, a T7 polymeraseexpression vector that is inducible by arabinose addition. The pTaraplasmid was the generous gift of Kathleen Mathews (Rice University)(Wycuff, D. R. et al., Anal Biochem 277:67-73 (2000)). Transformantswere propagated in LB medium lacking yeast extract with 0.2% (w/v)arabinose, 100 μg/ml ampicillin, and 30 μg/ml chloramphenicol for 16hours at 30° C. to allow phenotypic overexpression of SsoSSB protein.Control cultures were propagated identically, except 0.2% (w/v) glucosewas substituted for arabinose. Cells were subcultured into fresh mediumwithout chloramphenicol and grown at 30° C. until they were shifted tothe non-permissive temperature of 43° C. Optical densities weremonitored spectrophotometrically at a wavelength of 600 nm. Colonyforming units (cfu) per milliliter were determined by plating serialdilutions of each timepoint in triplicate on LB medium. Plates wereincubated overnight at 30° C. prior to scoring for viable counts.

DNA strand exchange reactions. E. coli RecA protein (11 μM) wasincubated with φX174 ssDNA (New England Biolabs) at a concentration of33 μM (nucleotides) in 30 mM TrisOAc (pH 7.5), 10 mM DTT, 20 mM MgOAc,2.5 mM ATP, and 5 μg/ml BSA at 37° C. for 10 minutes. After the additionof either 2.2 μM SsoSSB protein or E. coli SSB protein, reactionmixtures were incubated at 37° C. for another 5 min before theintroduction of PstI-linearized φX174 dsDNA (New England Biolabs) at aconcentration of 33 μM (nucleotides). The reaction mixtures were thenincubated at 37° C. for 90 minutes and were stopped by the addition ofSDS to a final concentration of 0.6% and proteinase K to a finalconcentration of 1 μg/ml. Deproteinization of the reaction mixtures wascarried out at 37° C. for 10 minutes.

Gel electrophoresis. Agarose gels for evaluation of cloning procedureswere prepared at 0.8% and run at approximately 150 V in TBE buffer priorto staining with ethidium bromide. Tricine SDS/PAGE, used to monitorprotein purification, was prepared with a 4% stacking gel and a 20%separating gel as described (Price, L. B. et al., J Bacteriol182:4951-4958 (2000)) and run at 100 V. Acrylamide gels for gelmobility-shift analysis were prepared at 10% in TBE buffer and run at100 V for 3 hours at 65° C. prior to exposure to phosphorimaging screensand analysis with a Storm 840 Phosphorlmager (Molecular Dynamics).Agarose gels for evaluation of DNA strand exchange products wereprepared at 1% and run at approximately 30 V in TAE buffer (0.04 MTrisOAc, 0.002 M EDTA) for 15 hours prior to staining with ethidiumbromide.

Example 2

This Example sets forth the results of studies conducted using themethods and materials set forth in the previous Example.

Crenarchaeal Homologues of ssDNA Binding Proteins.

Previous searches of genome sequences identified open reading frameswith homology to human RPA70 from the euryarchaeons M. jannachii, M.thermoautrophicum, and A. fulgidus (Chedin, F. et al., Trends BiochemSci 23:273-277 (1998); Kelly, T. J. et al., Proc Natl Acad Sci USA95:14634-14639 (1998)). Each of these sequences consists of multiplessDNA-binding domains contained in one or a pair of open reading frames,which share a significant degree of sequence homology among themselves.In one case, the protein encoded by this homologous sequence was shownto be an RPA homologue both structurally and functionally (Kelly, T. J.et al., Proc Natl Acad Sci USA 95:14634-14639 (1998)). However, therehas been no corresponding homologue of RPA identified in members of theother branch of the archaeal domain, the crenarchaea. The availabilityof sequence information from both A. pernix (Kawarabayasi, Y. et al.,DNA Res 6:83-101:145-152 (1999)) and S. solfataricus (She, Q. et al.,Proc Natl Acad Sci USA 98:7835-7840 (2001)) permitted a search forssDNA-binding protein sequences in the crenarchaea.

A survey of genome sequences from A. pernix and S. solfataricus with theprogram BLAST (Altschul, S. F. et al., J Mol Biol, 215:403-410, (1990))revealed a single small open reading frame in each genome with sequencesimilarity to MJ1159, the RPA homologue from M. jannaschii. Both openreading frames code for proteins that are strikingly similar to thefirst ssDNA-binding domain of the M. jannaschii RPA (FIG. 1 a), but aremuch shorter than the M. jannaschii protein. The S. solfataricussequence (SSO2364) is 477 base pairs in length and encodes a protein of148 amino acids while the A. pernix sequence (gi5105001) is 429 basepairs in length and encodes a protein of 143 amino acids; in contrast,the M. jannaschii protein is 645 amino acids in length. The S.solfataricus protein has a predicted pI of 9.0, and the A. pernixprotein also has a predicted pI of 9.0. For comparison, the pI's of theT4 phage SSB homologue gp32, E. coli SSB, and M. jannaschii RPA are 4.8,5.4, and 4.7 respectively. The S. solfataricus protein has 52%similarity and 26% identity with MJ1159 from amino acids 68 to 170 andthe A. pernix protein has 52% similarity and 25% identity with MJ1159from amino acid 70 to 173. The two residues conserved in all archaealsequences correspond to amino acids involved in DNA binding in humanRPA70 protein domain B (Thr 359 and Trp 361 in the hsRPA70 sequence)(Bochkarev, A. et al., Nature, 385:176-181 (1997)) (FIG. 1 a).Additionally, the S. solfataricus sequence shares two other amino acidswith human RPA70 that are implicated in DNA binding, Phe 386 and Ser 396(Bochkarev, A. et al., Nature, 385:176-181 (1997)).

The two crenarchaeal protein sequences show a remarkable level ofhomology between each other, exhibiting 69% similarity and 47% identity(FIG. 1). Both of the newly identified open reading frames aresignificantly shorter than their euryarchaeal and eukaryoticcounterparts, each comprising only a single ssDNA-binding region asopposed to closely related tandem repeats (four, in the case of the M.jannaschii protein) of multiple binding regions in a larger polypeptide.Examination of nearby regions in each genome revealed no furthersequences that might encode for portions of a ssDNA-binding protein,suggesting either that S. solfataricus SSO2364 and A. pernix gi5105001are the only genes responsible for producing SSB for these crenarchaeonsor that there were other genes elsewhere in the genome that encoded theremainder of the subunits for an RPA-like protein. No other sequencewith significant similarity to MJ1159 is apparent elsewhere in thecomplete genomes of either A. pernix or S. solfataricus, suggesting thatthese proteins may serve as the sole SSBs in these organisms.

The Crenarchaeal Proteins Structurally Resemble E. coli SSB Protein.

In contrast to other archaeal SSB homologues which resemble eukaryoticRPAs, it appears that the overall architecture of the crenarchaealversions is distinctly different. Euryarchaeal RPA homologues arecomprised of multiple DNA-binding domains within one or a pair ofproteins and presumably function in monomeric or heterodimeric forms,respectively (FIG. 2). The open reading frames from both A. pernix andS. solfataricus, however, are distinguished by a single coressDNA-binding domain of about 100 amino acids in length, identical tothat observed in eubacterial ssDNA-binding proteins. This suggests thatthe quaternary structure of the crenarchaeal SSB protein might beanalogous to the eubacterial version and could be tetrameric.Additionally, the crenarchaeal proteins lack the C-terminal zinc fingerof the euryarchaeal RPAs and instead, contain a number of acidicresidues in this region. This is analogous to the C-terminal structureof E. coli SSB, which is required for protein-protein interactions(Chase, J. W. et al., J Biol Chem 260:7214-7218 (1985); Kelman, Z. etal., EMBO J 17:2436-2449 (1998); Williams, K. R. et al., J Biol Chem258:3346-3355 (1983)). Also absent in the crenarchaeal sequences is anapproximately 100 residue N-terminal region found in euryarchaeal RPAsthat is believed to be involved in protein binding (Wold, M. S. Annu RevBiochem 66:61-92 (1997)).

The SSB protein homologue of phage T4, gp32, has little sequencehomology to bacterial SSBs but displays both structural similarity withthese proteins and an acidic C-terminus (Shamoo, Y. et al., Nature376:362-366 (1995)). Much like gp32, however, the crenarchaeal SSBsshare minimal sequence homology with E. coli SSB. BestFit alignment ofthe crenarchaeal sequences with E. coli SSB shows 35% similarity and 31%identity over only 48 amino acids for SsoSSB and 43% similarity and 38%identity over only 60 amino acids for ApeSSB. However, the crenarchaealproteins do appear to share structural similarity with the eubacterialprotein. Despite reduced sequence homology with E. coli SSB,homology-dependent modeling of the expected structure for S.solfataricus SSB reveals striking similarity between the two proteins inthe core DNA binding region; a similar analysis was used to confirm theprevious identification of the euryarchaeal RPA homologues. The modelingshows strong evolutionary structural conservation of the OB-fold(residues 32-71) which is implicated in DNA binding, as well as theα-helix, which is involved in subunit interactions (Webster et al, FEBSLett, 411:313-316 (1997)). Equivalent modeling using hsRPA70 alsodemonstrated strong structural conservation, especially with the RPA-Bsubdomain. These results confirm that the open reading frame from S.solfataricus encodes a ssDNA-binding protein.

Expression and Purification of the SSB Homologue from S. solfataricus.

To determine if SSO2364 from S. solfataricus did indeed encode afunctional SSB protein homologue, we purified and characterized thecandidate protein. The SSO2364 open reading frame was cloned into thepET21a bacterial expression vector and subsequently heterologouslyexpressed in E. coli. The bacterial strain carried a plasmid encodingthree rare tRNA genes in an effort to reduce translational pausing asthe predicted protein sequence contains 11 codons that are rare in E.coli, four of which are present in the final 50 bases of the openreading frame. The expressed protein is soluble and was purified to nearhomogeneity by first heat-treating the cell sonicate at 80° C. todenature all mesophilic proteins. This step was followed by affinitychromatography on ssDNA-cellulose, and by anion exchange chromatographyon Resource Q. The recombinant protein eluted from ssDNA-cellulose at0.75 M NaCl, which is identical to the salt concentration necessary forelution of E. coli SSB (Lohman, T. M. et al., Biochemistry 25:21-25(1986)). In contrast, elution of either yeast or M. jannachii RPA fromthe same matrix requires 1.5 M NaCl with the addition of 40% ethyleneglycol, 1.3 M potassium thiocyanate, or 1.5 M sodium thiocyanate(Henricksen, L. A. et al., J Biol Chem 269:24203-24208 (1994); Heyer, W.D. et al., Embo J 9:2321-2329 (1990); Kelly, T. J. et al., Proc NatlAcad Sci USA 95:14634-14639 (1998)), suggesting that the S. solfataricusSSB (SsoSSB) protein is chemically more similar to the bacterial proteinthan it is to RPA. The SsoSSB protein eluted at approximately 60 mM NaClfrom Resource Q and pooled fractions yielded essentially pure protein(greater than 98% purity based on Coomassie stained gels). Examinationof the purified protein by SDS/PAGE revealed a single band with amolecular weight of approximately 16 kDa, in close agreement with thepredicted molecular weight of 16,138 D as determined from the amino acidsequence (FIG. 3).

The SsoSSB Protein Forms Tetramers in Solution.

It was of interest to determine if the purified S. solfataricus proteinwould multimerise in solution. Analytical gel filtration was used todetermine the native form of the recombinant protein. Examination of theSsoSSB protein in solution revealed that it was present in threedistinct species: monomeric, dimeric, and tetrameric forms, at 18 kDa,36 kDa, and 62 kDa, respectively (FIG. 4 a). As demonstrated by arepresentative elution profile (FIG. 4 b), the composition of thepurified material was primarily tetrameric, while somewhat less dimericprotein was present. A significantly smaller quantity of the monomericform of SsoSSB protein was observed. This result indicates that theSsoSSB protein indeed forms tetramers in solution. The M. jannachii RPAprotein does not multimerize in solution (Kelly, T. J. et al., Proc NatlAcad Sci USA 95:14634-14639 (1998))).

SsoSSB Protein Binds ssDNA.

To determine the occluded binding site size of the protein and toevaluate the activity of the SsoSSB protein, we performed acrylamide gelmobility-shift assays with a radioactively labeled single-strandedoligonucleotide that was 63 nucleotides in length (FIG. 5). When a fixedquantity of radiolabeled oligonucleotide was incubated with increasingquantities of protein, a band of reduced mobility was observed. Theapparent mobility further decreased upon addition of more protein,finally achieving a constant mobility after the addition of 2 μMprotein. The absence of discrete species implied rapid equilibrationbetween bound and free forms of the complexes, and consistent with thispossibility, more discrete species were visualized when electrophoresiswas performed more rapidly (at high voltage). The addition of linearpUC19 DNA to a 100-fold molar (nucleotide) did not alter themobility-shift pattern, demonstrating that the SsoSSB has an, at least,100-fold greater affinity for ssDNA than for dsDNA. Saturation wasachieved at a ratio of approximately one SSB protein molecule to 5nucleotides. The M. jannaschii RPA protein has a site size of 15 to 20nucleotides (Kelly, T. J. et al., Proc Natl Acad Sci USA 95:14634-14639(1998)) whereas the site size of E. coli SSB protein, depending onsolution conditions, varies from 8 to 16 nucleotides per monomer. Theapparent site size of SsoSSB protein is similar to the 7 nucleotide sitesize observed for phage T4 SSB protein, gp32. Binding of SsoSSB proteinto ssDNA does not appear to be a cooperative process under theseconditions, as intermediately shifted species are evident instead of theapproximately 2-state transitions that typify cooperative binding.Rather, a steady reduction in the mobility of protein-DNA complexesoccurs as protein concentration is increased, suggesting that binding ofSsoSSB protein to ssDNA is distributive in nature. The observedband-retardation pattern is likely the result of a combination of thisnon-cooperative binding and rapid equilibration of protein-DNA complexesduring electrophoresis.

SsoSSB Protein is a Functional SSB Homologue.

In E. coli, SSB is an essential protein. A number of mutations were usedto elucidate the function of E. coli SSB protein; one mutation is thetemperature sensitive mutation called ssb-1. The ssb-1 mutation is analteration of amino acid 55 from a histidine to a tyrosine that isbelieved to destabilize the tetrameric SSB protein complex upon shiftingtemperature from 30° C. to the non-permissive temperature of 43° C.,resulting in a lethal phenotype (Chase, J. W. et al., J Mol Biol164:193-211 (1983)). Overexpression of wild-type E. coli SSB proteinencoded on a plasmid can overcome the lethality of the mutation (Chase,J. W. et al., J Mol Biol 164:193-211 (1983)) as can the SSB protein ofbacteriophage P1 (Lehnherr, H. et al., J Bacteriol 181:6463-6468(1999)). To test the in vivo functionality of SsoSSB, the protein wasoverexpressed in the E. coli strain KLC789, which carries the ssb-1allele. The SsoSSB plasmid was transformed into KLC789 along with aplasmid carrying an arabinose-inducible T7 polymerase gene. Cells weregrown in the presence of arabinose for 16 hours at the permissivetemperature to allow over-expression of the SsoSSB protein, whilecontrol cultures were cultivated in the presence of glucose. Cells werethen shifted to the non-permissive temperature and monitored for furthergrowth (FIG. 6 a). The presence of the open reading frame encodingSsoSSB protein in pET21a permitted continued growth of the ssb-1 mutantstrain while the pET21a vector alone was unable to rescue the lethalphenotype. Rescued growth required the presence of arabinose, as controlexperiments with glucose showed no increase in optical density thatwould be consistent with continued growth. To verify that the cellscontaining the SsoSSB plasmid were indeed still growing, viable countsfor each timepoint were determined by plating (FIG. 6 b). Following thetemperature shift, the control cultures continued to grow for a briefperiod and then dramatically lost viability. In contrast, the cellscontaining the SsoSSB plasmid displayed a continuous increase in viablecounts. These results indicate that SsoSSB protein is capable ofreplacing E. coli SSB protein in vivo and that it functions at 43° C.

SsoSSB Protein Stimulated DNA Strand Exchange by E. coli RecA Protein.

It was previously shown that heterologously expressed SSB proteins canstimulate DNA strand exchange mediated by E. coli RecA protein (Egner,C. et al., J Bacteriol 169:3422-3428 (1987)). To demonstrate thefunctionality of the SsoSSB protein in at least one nucleic acidmetabolic function in vitro, DNA strand exchange reactions wereperformed (FIG. 7). Purified SsoSSB protein was capable of substitutingfor E. coli SSB protein in DNA strand exchange reactions mediated by E.coli RecA protein at 37° C. using homologous φX174 ss- and dsDNA. Clearenhancement of DNA strand exchange, as determined by production ofnicked circular product, was observed with SsoSSB protein (FIG. 7, lane3). Interestingly, the M. jannaschii RPA protein can also promoteRecA-mediated DNA-strand exchange. The reduced amount of nicked circularproduct in reactions containing SsoSSB protein relative to levels withE. coli SSB protein may be the result of performing the experiment attemperatures where SsoSSB protein is not as effective. The optimaltemperature for the SsoSSB protein is expected to be near the growthtemperature of S. solfataricus, which is 80° C. As yet, no enhancementof strand exchange activity through addition of SsoSSB protein to S.solfataricus RadA protein-containing reactions at high temperature hasbeen observed, though this may be a consequence of inappropriate assayconditions or may signify a post-synaptic role for SsoSSB protein atphysiological temperatures.

Example 3

This Example discusses the results of the studies set forth above.

Divergence between the archaeal and eukaryotic lineages is more recentthan the divergence of the bacterial and eukaryotic/archaeal groups(Olsen, G. J. et al., Cell 89:991-994 (1997)). Accordingly, a number offeatures shared by eukaryotes and archaea, but not bacteria, includingreplication and transcription proteins were likely obtained afterevolutionary divergence of these three groups. In contrast, featuresfound in archaea and bacteria but not in eukaryotes includingmorphological attributes (a lack of organelles and nucleus), coupling oftranscription and translation, conjugative mechanisms, and a singlecircular genome may be reminiscent of a more ancient, shared ancestor.

However, despite the logic of this viewpoint, the evolutionary behaviorof the archaea is not so simplistic. Our results indicate that, incontrast to the canonical expectation that all archaeal ssDNA bindingproteins would be RPA-like, the crenarchaeal SSBs share sequencehomology with eukaryal RPAs but structural homology with bacterial SSBs.Homology-dependent modeling of SsoSSB demonstrates conservation of theOB-fold, indicative of ssDNA binding proteins. Both E. coli SSB andSsoSSB proteins elute from ssDNA-cellulose at the same saltconcentration, and both form tetramers in solution. The apparent bindingsite size is approximately 5 nucleotides, which is consistent with thesite size observed for T4 g32 protein and the low site size binding modeof E. coli SSB protein, but is one-fourth that of M. jannachii RPAprotein, which has four DNA-binding domains. Another feature that makesthe crenarchaeal SSB protein more eubacterial is that SsoSSB protein andApeSSB protein are missing the zinc finger motif near the C-terminuswhich is present in both euryarchaeal and eukaryal RPAs, suggesting thatthis motif was acquired after the separation of the two archaealbranches. Furthermore, the acidic residues in the SsoSSB proteinC-terminal region are similar to those found in bacterial proteins andmay have been maintained from the last shared bacterial and archaealancestor. SsoSSB protein resembles E. coli SSB chemically in that theyboth elute from ssDNA-cellulose at 0.75 M NaCl while the euryarchaealand eukaryal RPAs require higher salt concentrations and the addition ofethylene glycol, potassium thiocyanate, or sodium thiocyanate forelution from this matrix. Finally, SsoSSB protein can functionallysubstitute for E. coli SSB protein both in vivo and in vitro. In vivo,overexpression of the SsoSSB protein can overcome the lethal ssb-1temperature-sensitive mutation in E. coli; in vitro, the SsoSSB proteincan replace the E. coli SSB protein in DNA strand exchange reactionsmediated by RecA protein.

Concurrent with our efforts, the same crenarchaeal protein wasidentified by another laboratory using biochemical criteria (Wadsworth,R. I. et al., Nucleic Acids Res 29:914-920 (2001)). No multimerisationof the protein was demonstrated, contrary to our observation. In ourexperience, however, heating of the SsoSSB protein is necessary forformation of the tetrameric structure as observed by gel filtration andfor activity in DNA strand-exchange. The absence of this heating step byWadsworth and White could account for their failure to observetetramers.

Overall, the archaeal ssDNA binding proteins show more sequencesimilarity to eukaryotic RPA proteins than to E. coli SSB protein.However, the common sequences and DNA-binding domain utilization hint ata conserved evolutionary relationship between RPA and SSB (FIG. 2).Strong homology among the archaeal proteins suggests a commonevolutionary origin, and the crenarchaeal SSB proteins may represent alink to ancient ssDNA-binding proteins. The crenarchaeal SSB protein,being one of the simplest in structure, may represent the earliest formof single-stranded DNA-binding proteins involved in recombination and,it may serve as a model for understanding structure, function, andevolution of the conserved core ssDNA-binding domain.

Crenarchaea and euryarchaea are distinguished from each other not onlyby single-stranded DNA-binding proteins as described here, but also bydouble-stranded DNA-binding proteins. While eukaryotic-like histoneproteins have been identified and extensively studied in members of theeuryarchaea (Sandman, K. et al., Arch Microbiol 173:165-169 (2000)), nosuch homologues are apparent in members of the crenarchaea. Instead,crenarchaea maintain sequences that code for small, basic dsDNA-bindingproteins (Agback, P. et al., Nat Struct Biol, 5:579-584 (1998)); Faguy,D. M. et al., Curr Biol 9:R883-886 (1999)). These proteins are proposedto be comparable to histone-like proteins in eubacteria, especiallySso7d which was shown to be analogous to HU (Lopez-Garcia, P. et al.,Nucleic Acids Res 26:2322-2328 (1998). It appears that the crenarchaeahave maintained both single-stranded and double-stranded DNA-bindingproteins that are more similar to those found in bacteria than they areto eukaryotic homologues. Crenarchaeal dsDNA-binding proteins and SSBmay have evolved separately from mechanisms employed by eukaryotes or,alternatively, crenarchaea diverged earlier, prior to the co-evolutionof eukaryotic-like histones and RPA. The use of a eubacterial-like SSBmay necessitate a eubacterial DNA-binding protein for the compaction ofDNA. The discovery of an SSB homologue in the crenarchaea that issignificantly more similar to eubacterial SSB both structurally andbiochemically than it is to either euryarchaeal or eukaryotic RPAfurther establishes the evolutionary distance between the two archaealphyla.

It is understood that the examples and embodiments described herein arefor illustrative purposes only and that various modifications or changesin light thereof will be suggested to persons skilled in the art and areto be included within the spirit and purview of this application andscope of the appended claims. All publications, patents, and patentapplications cited herein are hereby incorporated by reference in theirentirety for all purposes.

What is claimed is:
 1. An isolated multimer, wherein each unit of saidmultimer has at least 70% sequence identity to SEQ ID NO:1, and whereinsaid multimer binds single stranded DNA.
 2. An isolated multimer ofclaim 1, wherein each unit of said multimer has at least 80% sequenceidentity to SEQ ID NO:1.
 3. An isolated multimer of claim 1, whereineach unit of said multimer has at least 90% sequence identity to SEQ IDNO:1.
 4. An isolated multimer of claim 1, wherein each unit has thesequence of SEQ ID NO:1.
 5. An isolated multimer of claim 1, whereinsaid multimer is a tetramer.
 6. A method of performing nucleic acidamplification, said method comprising contacting a single stranded DNAwith a multimeric protein, wherein each unit of said multimeric proteinhas at least 70% sequence identity to SEQ ID NO:1, and wherein saidmultimeric protein binds single stranded DNA.
 7. A method of claim 6,wherein each unit of said multimeric protein has at least 80% sequenceidentity to SEQ ID NO:1.
 8. A method of claim 6, wherein each unit ofsaid multimeric protein has at least 90% sequence identity to SEQ IDNO:1.
 9. A method of claim 6, wherein each unit of said multimericprotein has the sequence of SEQ ID NO:1.
 10. A method of claim 6,wherein said nucleic acid amplification is selected from the groupconsisting of polymerase chain reaction, ligase chain reaction,transcription-based amplification system, and self-sustained sequencereplication system.
 11. A method of claim 10, wherein the method ofnucleic acid amplification is polymerase chain reaction.
 12. A methodfor performing nucleic acid engineering, comprising contacting singlestranded DNA with a multimeric protein, wherein each unit of saidmultimeric protein has at least 70% sequence identity to SEQ ID NO:1,and wherein said multimeric protein binds single stranded DNA.
 13. Amethod of claim 12, wherein each unit of said multimeric protein has atleast 80% sequence identity to SEQ ID NO:1, and wherein said multimericprotein binds single stranded DNA.
 14. A method of claim 12, whereineach unit of said multimeric protein has at least 90% sequence identityto SEQ ID NO:1, and wherein said multimeric protein binds singlestranded DNA.
 15. A method of claim 12, wherein each unit of saidmultimeric protein has the sequence of SEQ ID NO:1, and wherein saidmultimeric protein binds single stranded DNA.
 16. A method of claim 12,wherein said nucleic acid engineering is selected from the groupconsisting of PCR-based DNA sequencing, recombination mediated cloning,PCR-mediated gene replacement, PCR-mediated recombination, RT-PCR cDNAsynthesis, and in vitro sequence mutagenesis.